244 ◾ Bioinformatics
bedtools getfasta \
-fi ref/hg19.fa \
-bed motifs/chip3_peaks.bed \
-fo motifs/chip3_peaks.fasta
Those three FASTA files contain the sequences of enriched peaks for each sample and we
will use them as inputs for the motif detection programs.
There are two approaches for motif detection: de novo method when no prior informa-
tion is assumed and a position weight matrix (PWM) method for known motif.
The de novo approach searches for motifs in an input FASTA sequences without prior
information about the motifs. The search is conducted in a window around the peak. The
motif discovery programs either create k-mers from the sequences and perform exhaustive
search to identify the most frequent consensus substring of the sequences as motifs or use
sequence alignments iteratively to create consensus motifs from the PWM that identifies
motifs as the consensus motifs with the most frequent nucleobases. An example of de novo
motif discovery program is MEME Suite [11], which has DREME, MEME, or STREME
programs for discovering ungapped motifs. DREME is k-mer based, but it is depreciated
and will not be supported in the future. MEME is an alignment-based motif discovery
tool but it is recommended for motifs discovery in less than 50 sequences. STREME is a
k-mer based and it is recommended for detecting motifs in a dataset with more than 50
sequences. MEME SUITE is available as web server and command-line programs. To use
the web server or to download and install MEME SUITE, visit “https://meme-suite.org/
meme/”. On Linux, you can download and install MEME SUITE by using the following
steps:
wget https://meme-suite.org/meme/meme-software/5.4.1/meme-
5.4.1.tar.gz
tar vxf meme-5.4.1.tar.gz
cd meme-5.4.1
./configure --prefix=$HOME/meme --enable-build-libxml2
--enable-build-libxslt
make
make test
make install
Once you have installed it, you can add the following to “.bashrc” file:
export PATH=$HOME/meme/bin:$HOME/meme/libexec/meme-5.4.1:$PATH
The version may change so the best way is to visit the MEME SUITE website for the lat-
est installation instruction.
After adding the above line to the “.bashrc” file, you may need to restart the terminal or
use “source ~/.bashrc” for the change that you have made to take effect.
The MEME Suite programs require the ChIP-Seq dataset in FASTA (primary data-
set) and control dataset (secondary dataset). If no control dataset is used, MEME Suite